LIBRARY  OF  THE 

UNIVERSITY  OF  ILLINOIS 

AT  URBANA-CHAMPAIGN 

51 0  .  84 

l^6T 

no. 171 -187 

cop. a. 


Digitized  by  the  Internet  Archive 
in  2013 


http://archive.org/details/fasttranslationt172gear 


Dort  No.    172 

*o.  I7A. 

coo-li+69-0001 


A  FAST  TRANSLATION  TECHNIQUE 
WHICH  PARTIALLY  OPTIMIZES  THE  OBJECT  CODE 


by 


C.  W.  Gear 


JAN  1  ■  ■■ 


April  5,  1965 


The  person  charging  this  material  is  re- 
sponsible for  its  return  on  or  before  the 
Latest  Date  stamped  below. 

Theft,  mutilation,  and  underlining  of  books 
are  reasons  for  disciplinary  action  and  may 
result  in  dismissal  from  the  University. 

University  of  Illinois  Library 


Report  No.  172 


A  FAST  TRANSLATION  TECHNIQUE 
WHICH  PARTIALLY  OPTIMIZES  THE  OBJECT  CODE 

by 
C.  W.  Gear 


April  5,  1965 


Department  of  Computer  Science 

University  of  Illinois 

Urbana,  Illinois 


Abstract 

A  three-pass  compiler  is  briefly  described  with  the  following 
properties :   The  last  two  passes  scan  an  intermediate  language  produced  by  the 
preceding  pass  in  essentially  the  reverse  of  the  order  in  which  it  was  generated, 
so  that  the  first  pass  is  the  only  one  which  has  to  read  the  relatively  bulky 
problem  oriented  input.   The  double  scan,  one  in  either  direction,  performed  by 
the  first  two  passes,  allows  the  compiler  to  remove  locally  constant  expressions 
and  recursively  calculable  expressions  from  loops  and  to  do  the  important  part 
of  common  subexpression  recognition.   Optimization  such  as  the  effective  use  of 
index  registers,  although  as  important,  is  not  discussed  since  the  object  code 
which  would  be  most  efficient  is  highly  machine  dependent.   The  discussion  is  in 
terms  of  a  F0RTRAN-like  language,  although  the  technique  is  applicable  to  most 
algebraic  languages . 
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1.   Introduction 

Existing  compilers  tend  to  fall  into  two  classes.   Those  that  are 
relatively  fast,  very  often  syntax  directed,  one  or  two  pass  programs  with 
either  an  inefficient  or  an  interpretive  object,  and  those  which  spend  a  large 
amount  of  time  optimizing  the  object  for  the  particular  machine.   Descriptions 
of  the  latter  do  not  often  appear  in  print  because  of  their  highly  machine 

dependent  nature.   (However,  descriptions  of  IBM  FORTRAN  II L  J  and  the  KDF  9 

[3] 
optimizing  ALGOL    are  available.) 

The  arguments  in  favor  of  this  situation  are  well  known.  Code  checks 
are  run  on  the  fast  compiler  and  then  an  optimized  object  is  obtained  from  the 
other  for  production. 

It  has  been  the  experience  of  this  author  that  a  large  number  of 
medium-sized  jobs  seldom  remain  unchanged .   Before  each  production  run  small 
changes  are  made  at  the  source  language  level.   Both  because  of  this,  and 
because  the  source  language  is  a  condensed,  readily  comprehensible  representation 
of  the  job,  it  is  highly  desirable  to  seek  a  middle  ground  with  some  of  the 
features  of  both  worlds . 

The  purpose  of  this  paper  is  to  describe  a  method  for  the  high-speed 
compilation  of  code  which  has  been  rearranged  so  as  to  optimize  it  in  terms 
of  the  number  of  basic  operations  performed.   This  technique  is  machine 
independent.   Machine- dependent  features,  such  as  index  registers,  finite  length 
operand  stacks,  etc.,  are  not  discussed. 

The  object  code  of  a  compiler  can  be  made  more  efficient  in  a  number  of 
ways.   They  include  the  location  of  locally  constant  expressions  or  expressions 
that  can  be  calculated  by  recursion,  location  of  common  subexpressions,  and  the 
rearranging  of  the  calculation  to  facilitate  any  of  these  objectives,  such  that 
either  code  can  be  removed  from  loop  or  duplicated  coding  can  be  removed.   Non- 
buffered   input  from  auxiliary  storage  can  also  be  made  more  efficient  within 
the  operating  system  by  pre-calling  the  input  package  to  load  a  buffer.   For 
example,  the  FORTRAN  statement  READ  TAPE  N,  i/o  list  will  cause  a  wait  by  the 
main  frame  until  tape  N  has  been  read  into  a  buffer  so  that  it  can  be  copied  to 
the  i/o  list.   This  wait  can  be  reduced  by  calling  for  the  buffer  fill  earlier. 
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In  order  to  perform  these  types  of  optimization  it  is  necessary  to 
analyze  all  statements  in  a  neighborhood  of  a  given  statement.   This  neighbor- 
hood is  the  extent  of  the  loop  or  block  of  program  from  which  it  is  desired  to 
remove  code.   Thus  most  optimization  procedures  involve  a  considerable  amount  of 
rescanning,  or  the  construction  of  very  large  tables  which  must  be  examined 
more  or  less  at  random. 

The  requirements  for  high-speed  compilation  are  that  a  minimum  of 
scans  of  the  entire  data  be  used,  and  that  it  not  be  necessary  to  refer  at 
random  to  sections  of  the  program  remote  from  the  one  being  considered,  or 
equivalent ly,  to  very  large  tables  containing  sections  of  the  program  .   For  high- 
speed computer  systems  now  being  designed,  it  is  a  further  advantage  to  refer  to 
memory  in  a  sequential  manner  rather  than  in  a  highly  data  dependent  manner,  since 
a  larger  degree  of  memory-main  frame  overlap  is  then  possible  using  the  various 
look-ahead  schemes . 

Thus  the  requirements  of  high-speed  compilation  and  efficient  object 
code  production  are  somewhat  incompatible.   Any  solution  is  bound  to  reflect  the 
assessment  of  the  individual  as  to  the  relative  importance.   In  this  paper,  high- 
speed compilation  is  stressed,  optimization  is  performed  within  this  constraint. 

This  paper  will  not  deal  with  the  techniques  involved  in  the  standard 
compilation  process  since  they  are  both  well  known  and  machine  dependent.   It 
will  only  point  out  additions  to  the  processes  which  achieve  optimization. 


2.   An  Outline  of  the  Program 

An  example  will  clarify  the  type  of  optimization  being  sought.   The 
following  section  of  a  FORTRAN  program 

DIMENSION  A (15, 5) 

D0   11=  M1,N1,L1 

D0   2   J  =  M2,N2,L2 
2  A(I,J)  =  A(I,J)*C*15'5*D**5 
1   C      =  C  +  1. 

might  compile  on  a  fast  "single-pass"  compiler  as  the  machine  language  equivalent 
of 
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Move  Ml  to  I 
LOOP  1  Move  M2  to  J 
LOOP  2      (J-1)*15  +  1-1  +  A  -»  Tl 

(J-1)*15  +  1-1  +  A  -  T2 

Load  indirect  from  T2 

Multiply  by  C 

Multiply  by  15 -5 

Multiply  by  D 

Multiply  by  0.5 

Store  indirect  into  Tl 

J  +  L2  -  J 

Subtract  N2  and  transfer  to  LOOP  2  if  result  <  0 

C  +  1.  ->  C 

I  +  LI  -  I 

Subtract  Nl  and  transfer  to  L00P1  if  result  <  0 


where  Tl  and  T2  are  temporary  storage  cells,  and  A  is  the  memory  cell  containing 
A(l,l). 

A  fairly  efficient  object  version  is  given  in  the  form  below. 

LI  -»  Xk 

A  -  ik   +  15*M2  +  Ml  ->  X3 

L2*15  ->  X2 

T2  =  D*T-T5 

Ml  -  I 

Transfer  to  LOOP  1+1  

L00P1    X3  +  Xk   -»  X3 

X3  -  XI  <3 

T2*C  -»  Tl 

M2  -  J 

Transfer  to  L00P2  +  1  

L00P2    XI  +  X2  -  XI 

Load  indirect  from  XI    <}— 
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Multiply  by  Tl 

Store  indirect  in  XI 

J  +  L2  -  J 

Subtract  N2  and  transfer  to  L00P2  if  result  <  0 

C  +  1.  -*  C 

I  +  LI  -*  I 

Subtract  Nl  and  transfer  to  L00P1  if  result  <  0 

The  basic  steps  that  have  been  taken  in  this  example  are: 

a)  The  locally  constant  expression  C*15. 5*D*. 5  was  removed  from  the 
inner  loop  since  C  and  D  are  not  changed  in  that  loop.   Since  C  but 
not  D  is  changed  in  the  outer  loop  the  15.5*D*.5  part  can  also  be 
removed  from  that  loop. 

b)  The  expression  (j  +  l)*15  +  I  -  1  +  A  is  recognized  to  be 
recursively  calculable  in  the  inner  loop  by  starting  at 

(M2  -  l)*15  +  I  -  1  +  A  and  adding  increments  of  15*L2.   Then  the 
starting  expression  is  also  recognized  as  recursively  calculable 
in  the  outer  loop. 

c)  The  expression  15.5*D*-5  is  rearranged  to  get  D*7-75- 

d)  The  occurrence  of  the  expression  (J  -  l)*15  +  I  -  1  +  A  twice  is 
recognized. 

The  first  two  of  these  steps  may  only  be  taken  if  the  variables 
concerned  are  either  not  changed  inside  the  loop,  or  are  changed  in  an  incremental 
manner.   It  is  therefore  necessary  to  examine  the  range  of  each  loop  for  potential 
changes.   The  assumptions  made  about  function  or  subroutine  calls  are  crucial  in 
this  problem.   If  side  effects  are  allowed,  then  it  is  necessary  either  to 
tabulate  information  about  the  changes  effected  by  each  such  procedure,  or  else 
to  assume  that  everything  may  be  changed  and  abandon  this  optimization  in  any 
loop  containing  such  a  call. 

In  order  to  gather  the  information  required  about  each  loop  in  the 
program  in  a  straight  forward  manner  Pass  I  maintains  a  stack  S  ,  each  of  whose 
entries  is  a  simple  list  of  variable  names.   The  top  level  of  the  stack 
corresponds  to  the  iterative  loop  of  the  program  currently  being  scanned  and 
lower  levels  correspond  to  each  of  the  loops  within  which  the  current  one  is 
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nested.   As  each  statement  is  scanned,  the  names  of  variables  which  are  changed 
by  the  statement  are  added  to  the  top  level  of  S  .   When  the  end  of  a  loop  is 
reached,  the  top  level  contains  the  list  of  all  changed  variables.   This  list 

is  saved  and  also  added  to  the  second  level  of  S  ,  so  that  when  the  old  top 

v  * 

level  is  discarded,  the  new  top  level  contains  the  current  set  of  changed 
variables  for  the  next  outer  loop. 

Pass  I  also  translates  the  source  input  into  a  more  compact  internal 
form  where,  for  example,  arithmetic  expressions  are  in  a  polish  postfix  notation. 
In  order  not  to  have  to  save  each  of  the  lists  of  changed  variables  in  main  core, 
they  also  are  added  to  the  intermediate  language  output  from  Pass  I.   They  occur 
at  the  end  of  the  block  to  which  they  refer. 

To  handle  recursive  calculation,  a  second  stack  S   (which  could  be 
part  of  S  )  is  maintained  by  Pass  I.   The  loop  parameters  are  entered  into  the 
top  of  this  stack  when  they  are  read  so  that  they  can  also  be  moved  to  the  end 
of  the  loop  so  that  a  test  can  be  made  to  see  if  the  controlled  variable  or  the 
step  quantity  is  changed  inside  the  loop. 

Pass  I  will  also  perform  the  syntax  checking  of  the  source  language, 
and,  for  efficiency,  it  should  replace  all  identifiers  by  an  internal  form  such 
as  a  table  address. 

Pass  II  basically  scans  the  output  from  Pass  I  in  reverse  order.   This 
makes  the  list  of  changed  variable  names  and  other  loop  information  available 
to  the  program  just  as  the  first  (in  time)  statement  of  a  loop  is  read.   Pass  II 
maintains  a  stack  S  by  placing  the  changed  variable  list  on  top  of  S  when  the 
end  of  each  loop  is  entered,  and  removing  it  when  the  beginning  of  the  loop  is 
left.   Thus  during  the  processing  of  any  loop,  the  top  of  the  stack  S  contains 
the  list  of  variables  changed  in  that  loop.   Similarly  a  stack  S  contains  at 
its  top  the  current  loop  control  parameters.   Additional  stacks  ST  and  S  whose 
entries  are  also  lists  are  maintained  by  Pass  II.   These  stacks  contain 
respectively  the  lists  of  strings  representing  the  local  constants  and  recursive 
increments  which  are  in  the  process  of  being  moved  to  the  beginning  of  a  loop. 

The  output  of  Pass  II  is  in  a  machine  language  with  symbolic  addresses. 
The  length  of  the  program  being  generated  can  be  calculated  during  the  second 
pass,  so  that  at  its  completion,  variable  storage  can  be  assigned.   Pass  III 
will  then  produce  a  relocatable  binary  object. 
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The  method  described  below  involves  operations  such  as  concatenating 
elements  and  strings  to  form  larger  strings,  where  these  elements  or  strings  are 
members  of  stacks.   Deliberately,  no  mention  is  made  of  how  to  mechanize  this 
since  it  is  highly  machine  dependent.   On  some  machines  it  is  better  to  chain 
everything  and  not  to  move  elements  unless  absolutely  necessary,  on  other 
machines  the  opposite  holds.   These  tricks  of  the  programmer's  trade  can  easily 
save  or  cost  a  factor  of  2  in  speed,  so  a  judgment  of  the  method  should  include 
an  evaluation  of  their  potential  value  in  this  problem. 


3=   Removing  Locally  Constant  Subexpressions 

This  section  indicates  the  manner  in  which  Pass  II  scans  arithmetic 

expressions  in  order  to  locate  locally  constant  subexpressions.   The  information 

it  has  available  is  contained  in  the  top  entry  of  the  stack  S  ,  which  contains 

v 

a  list  of  changed  names.   For  speed  it  is  convenient  on  most  machines  to 
translate  this  information  so  that  it  is  represented  by  a  bit  in  the  name  table. 

Arithmetic  expressions  have  been  translated  by  Pass  I  into  a  polish 
postfix  form.   Although  the  main  Pass  II  scan  is  proceeding  backwards,  it  is 
more  convenient  to  scan  expressions  forward.   This  can  be  handled  either  by 
having  Pass  I  reverse  the  order  of  the  expression  or,  probably  more  conveniently, 
have  Pass  I  generate  a  pointer  at  the  end  of  an  expression  indicating  the 
beginning  of  the  expression. 

As  the  polish  postfix  form  is  scanned  from  left  to  right,  operand 
names  are  read  and  placed  in  an  operand  stack  S  .   As  they  are  put  in  this  stack 
they  are  marked  as  changed  variables  or  not  depending  on  whether  they  appear  in 
the  top  of  S  or  not.   When  operations  are  read  in  from  the  polish  postfix 
string  they  must  be  compiled.   The  nature  of  the  compiling  action  depends  on 
the  operands  involved.   Consider  only  binary  operations.   (n-ary  operations 
follow  the  same  principle.)   If  neither  of  the  top  two  operands  in  S  are 
changed  no  code  is  generated,  rather  the  top  two  levels  are  replaced  with  a 
single  entry  representing  the  combination.   For  example,  if  the  top  two  levels 
are  both  locally  constant  variables  (LCV)  they  are  replaced  by  a  single 
locally  constant  expression  L CE.   This  is  mechanized  by  forming  a  polish  postfix 
string  consisting  of  the  top  two  operands  followed  by  the  operation,  placing 
this  in  main  store  and  placing  a  pointer  to  it  in  the  stack  with  the  mark  LCE. 
Thus  the  entries  in  the  operand  stack  can  be  any  one  of  the  classes 


C  (constant,  e.g.,  15 • 5) 

LCV  (locally  constant  variable) 

LCE  (locally  constant  expression) 

V  (variable,  one  that  is  changed  in  the  loop) 

The  rules  for  the  combination  of  these  by  any  binary  operation  are: 


1st  and  2nd  level 
Operands 

Action 

C  -  C 

Combine  at  compile  time  to  get  new  constant 

C  -  LCV 

C  -  LCE 

LCV  -  LCV 

LCV  -  LCE 

and 

LCE  -  LCE 

Form  new  string  consisting  of  the  concatenation  of  1st  level  of 

S  ,  2nd  level  of  S  ,  operation 

Single  stack  entry  becomes  LCE  pointing  to  this  string 

C  -  V 
LCV  -  V 

V  -  V 

Compile  object  code  directly 
Single  stack  entry  becomes  V  type 

LCE  -  V 

Allocate  a  temporary  storage  cell  with  name  T,  say. 
Compile  code  for  T,  V,  operation 

Add  'T  =  '  to  the  end  of  the  string  representing  the  LCE 
and  add  this  new  string  to  the  top  level  of  the  S   stack, 

Li 

the  stack  of  locally  constant  expressions.   The  single  Sq 
stack  entry  becomes  V  type. 

When  the  beginning  of  the  loop  is  reached  in  the  Pass  II  scan,  the  loop  is  closed, 

the  top  level  of  S„  discarded  and  the  polish  postfix  assignment  statements 

removed  from  the  top  of  the  S   stack  and  compiled  to  yield  the  program  to  generate 

L 

the  local  constants.   Since  these  expressions  may  be  partially  or  wholly  local 
constants  in  this  outer  loop,  they  should  be  handled  in  the  same  manner  as  Pass  I 
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generated  expressions,  that  is,  all  that  is  necessary  is  to  switch  for  input 
from  the  intermediate  language  of  Pass  I  to  this  list  until  it  is  exhausted. 
Since  the  temporary  storage  name  generated  for  a  locally  constant  expression 
is  not  marked  as  changed,  the  rules  outlined  above  will  cause  an  expression 
to  be  removed  from  all  loops  in  which  it  does  not  change. 

General  n-ary  operations  such  as  functions  can  be  handled  by  the 
obvious  extension  of  the  process.   Indexing  is  an  extension  of  functions. 
If  the  indexing  expressions  are  local  constants  then  the  address  of  the  element 
can  be  calculated  outside  of  the  loop.   If  in  addition  no  elements  of  the  array 
are  changed,  then  the  element  could  be  part  of  a  larger  locally  constant 
expression. 

h.        Recursive  Calculation 

Recursive  calculation  of  addresses  has  been  discussed  by  Samelson 
and  Bauer    for  ALG0L.   Restrictions  essentially  similar  to  those  of  FORTRAN 
are  imposed  on  the  subscripts.   The  method  proposed  here  is  to  calculate  all 
expressions  recursively  where  that  can  be  done.   If  the  expression  happens  to 
be  an  index  expression,  then  a  compiler  for  a  particular  machine  can  take  note 
of  it  and  use  index  registers  or  other  features  to  perform  the  calculation. 

The  technique  is  essentially  identical  to  the  one  used  for  identifying 
local  constants.   A  new  type  of  operand  stack  entry  is  defined,  the  "Step 
Variable"  (SV)o   This  is  a  variable  which  is  incremented  by  constant  amounts 
during  the  execution  of  a  loop.   The  controlled  variable  in  the  loop  is  of  this 
type  provided  that  neither  it  nor  the  step  quantity  are  changed  inside  the  loop. 
This  is  the  reason  for  moving  the  controlled  variable  name  and  loop  parameters 
to  the  end  of  the  loop  by  the  S  stack  mechanism  in  Pass  I.  As  the  end  of  a  loop 
is  entered,  the  new  top  level  is  added  to  the  S  stack,  and  the  loop  count 
mechanism  is  compiled.   If  the  expression  used  for  the  step  compiler  is  a  fixed 
point  constant  or  local  constant,  the  controlled  variable  is  checked  against 
the  changed  list.   If  it  is  not  changed  in  the  loop  (except  by  the  loop  control) 
it  is  flagged  in  the  table  as  a  step  variable  for  the  duration  of  this  loop.   (if 
the  controlled  variable  of  the  next  outer  loop  is  also  flagged  as  a  step  variable, 
its  flag  must  be  turned  off  until  the  beginning  of  the  loop  has  been  reached.) 
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Another  type  of  operand  stack  entry  is  defined,  a  "Step  Expression" 
(SE).   An  SE  is  a  combination,   by  addition  or  subtraction  of  any  two  SV's,  SE's, 
LCE's,  LCV's  or  C's,  one  of  which  is  an  SV  or  SE,  or  a  combination,  by  multi- 
plication of  an  SE  or  SV  with  an  LCE,  LCV  or  C  type. 

Thus  a  step  expression  can  be  expressed  in  the  form  LCEA  +  LCEB*I 
where  I  is  the  step  variable  and  LCEA  and  LCEB  are  two  C's,  LCV's  or  LCE's. 
Since  I  is  being  incremented  by  a  locally  constant  amount,  say  INC,  the  value 
of  the  step  expression  can  be  obtained  by  starting  at  LCEA  +  LCEB*S  and 
incrementing  each  time  by  LCEB*TNC  where  S  is  the  starting  value  of  I. 

The  procedure  is  mechanized  by  marking  a  variable  in  the  operand 
stack  Sn  as  an  SV  if  it  is  so  marked  in  the  variable  table.   When  two  (or  more) 
operands  are  combined  which  result  in  an  SE,  the  entry  in  the  stack  is  so  marked, 
and  a  pointer  is  placed  there  which  points  to  a  pair  of  strings.   The  first 
string  represents  LCEA  and  the  second  represents  LCEB. 

When  the  operation  is  an  addition  (or  subtraction),  the  new  lists  are 
formed  by  concatenating  the  pairs  of  strings  representing  each  operand  together 
with  a  trailing  plus  (minus)  sign.   If  either  of  the  expressions  is  simpler  than 
an  SE,  then  its  LCEA  and  LCEB  are  implied  in  the  obvious  way. 

When  the  operation  is  multiplication  of  an  SE  or  SV  by  a  C,  LCV  or 

LCE,  the  new  LCEA  is  the  string  LCEA  A    LCV  f  *  and  the  new  LCEB  is  similarly 

r  C   "1  L  LCE  J 

LCEB^  LCV  Y  *. 

^LCE  J 

In  all  other  cases  involving  an  SE  or  SV,  some  object  code  is  to  be 
compiled.   If  the  operand  is  either  an  SV  or  an  SE  with  its  LCEB  equal  to  +1, 
then  there  is  no  point  to  handling  it  as  other  than  a  changed  variable  (with 
possibly  a  locally  constant  additive  LCEA).   Apart  from  these  cases,  it  is 
necessary  to  assign  a  temporary  storage  cell,  say  Tl,  to  the  SE  for  the  purpose 
of  compiling  the  current  operation.   A  second  temporary  storage  cell  T2  must 
also  be  provided  for  the  increment  storage.   The  strings  LCEA  LCEB  S  *  +  Tl  = 
and  LCEB  INC  *  T2  =  are  added  to  the  top  list  of  the  stack  ST  of  locally  constant 

Li 

expressions  and  Tl  T2  +  Tl  =  is  added  to  the  top  of  the  SD  stack  of  recursive 

K 

increments . 
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When  the  beginning  of  the  loop  is  reached  in  the  scan,  it  is  necessary 

to  prepare  the  incrementing  orders  by  compiling  the  top  level  of  the  S     stack 

R 

and  then  discarding  it.   The  loop  is  then  closed  by  removing  the  top  level  of  the 

S  stack  and  compiling  the  loop  initialization  plus  the  transfer  around  the 

incrementation  orders  before  compiling  the  top  list  of  the  S  stack. 

L 

Just  as  the  multiplication  operation  has  been  reduced  to  successive 

additions,  the  exponentiation  operation  could  be  reduced  to  successive 

multiplications,  but  this  author  feels  that  it  is  of  doubtful  value,  both  because 

of  rounding  error  and  frequency  of  occurrence. 


5.    Rearranging  the  Calculation  of  an  Expression 

Whether  or  not  this  should  be  done  is  open  to  question  numerically. 
The  fact  that  some  compilers  work  from  left  to  right,  others  right  to  left  and 
still  others  in  some  optimizing  manner  means  that  if  the  programmer  desires  to 
calculate  A+B+C  in  a  specific  order,  he  should  indicate  it  with  parentheses, 
e.g.,  A+(B+C),  if  he  intends  to  use  the  program  in  more  than  one  installation. 
In  order  to  indicate  the  difference  between  the  three  expressions  A+B+C,  A+(B+C) 
and  (A+B)+C  in  the  polish  postfix  string,  a  convention  must  be  used  which  allows 
the  rearrangement  of  calculation  in  the  first  instance.   Such  a  convention  is  to 
allow  n-ary  versions  of  operands.   Thus  A+B+C  becomes  ABC+  •   The  subscript  3 
indicates  that  +  is  the  summation  of  the  3  elements  A,B  and  C.   A+(B+C)  and 
(A+B)+C  would  then  uniquely  be  ABC++  and  AB+C+  where  the  subscript  2  for  binary 
operations  has  been  dropped.   In  order  that  this  convention  also  handle  the 
subtraction  (and  division)  operations  the  "sign"  of  the  operation  is  placed 
over  the  last  element  representing  that  variable.   Thus  A-B+C  is  ABC+  while 
A-(B+C)  is  ABC++  and  A+(-B+C)  is  ABC++.   Similarly  a/b*C  and  A/(B*C)  are  ABC* 
and  ABC**  respectively. 

When  the  operation  +   in  the  expression  ABC+   is  to  be  compiled  by 
Pass  II,  the  objective  is  to  rearrange  the  elements  so  that  the  minimum  of  work 


is  done  at  execution  time.   This  has  been  discussed  in  the  case  of  constants  by 
Floyd    and  Gear    .   There  is  a  hierarc 
the  stack  at  any  time.   This  ordering  is 


Floyd    and  Gear    .   There  is  a  hierarchy  of  operands  that  are  possible  in 
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Constant 

Locally  Constant  Expression 

(including  Locally  Constant  Variable) 

Step  Expression/    .   .  .     N 

(including  Step  Variable) 

Variable 


When  an  n-ary  commutative  operation  +  or  *   is  encountered  by  Pass  II,  the  top 
n  levels  of  the  operand  stack  S  are  examined  to  find  all  occurrences  of  constants 
first,  these  are  combined  at  compile  time  and  n  is  appropriately  reduced.   If 
it  is  not  now  1,  the  remaining  levels  are  examined  for  LCE's  or  LCV's.   If  there 
are  any,  they  are  combined  with  any  constant  to  give  a  new  LCE.   If  n  still  has 
not  been  reduced  to  1,  the  search  continues  for  SE's  or  SV's  which  can  be 
legitimately  combined  into  SE's.   Finally,  if  n  is  not  1,  code  is  compiled  and 
the  stack  entry  becomes  a  variable. 


6.    Common  Subexpressions 

Floyd    has  described  one  way  of  locating  common  subexpressions 
within  a  single  expression.   This  method  could  be  extended  to  a  number  of 
expressions  within  a  program  segment  at  a  considerable  cost  in  main  storage 
requirements.   It  seems  doubtful  that  any  simple  technique  can  be  found  to  handle 
this  problem  without  considerable  searching  and  comparing.   This  author  feels 
that  in  general  the  pay-off  for  the  amount  of  work  involved  is  marginal  for  the 
following  reasons : 

a)  If  the  common  subexpression  is  very  short,  e.g.,  A+B  or  A*B, 
the  gain  in  execution  speed  is  small  since  in  most  cases  a 
store  order  is  involved  to  save  the  result  in  place  of  the 
common  arithmetic  operation. 

b)  If  the  common  subexpression  is  lengthy,  the  laziness  of  the 
programmer  can  be  relied  upon  to  do  the  necessary  simplification, 

e.g.,   ...   (A+B)*(C+D)/(E+(A+B)*(C+D))   ••• 

is  likely  to  appear  as 

T  =  (A+B)*(C+D) 

. . .  t/(e+t)  ... 

which  is  a  considerable  saving  in  writing. 


-11- 


However,  when  a  feature  of  the  language  expresses  a  fairly  involved  expression 
in  a  simple  manner,  the  recognition  of  common  subexpressions  becomes  advantageous, 
Such  a  case  is  indexing.   For  example,  the  assignment  statement  for  three- 
dimensional  Gaus-Seidel  iteration  is 

A(l,J,K)  =  (A(l,J,K-l)  +  A(l,J,K+l)  +  A(l,J-l,K)  +  A(l,J+l,K) 

+  A(l-1,J,K)  +  A(l+l,J,k))/6. 

This  contains  the  expression  ((K-l)*N2  +  J-l)*Nl  +  1-1  +  A  plus  various  constants 
seven  times  where  Nl  and  N2  are  the  dimensions  of  the  first  two  subscripts  of  A. 
The  method  proposed  here  is  not  to  look  for  such  common  subexpressions  unless 
they  are  either  local  constants  or  step  functions.   If  this  is  done,  then  each 
time  that  a  new  step  function  or  local  constant  is  generated,  it  should  be 
compared  against  the  existing  set  in  the  top  level  of  S  before  being  entered 

b 

there.   A  step  function  contains  two  local  constants.   If  the  step  part  LCEB 
agrees  with  the  step  part  of  another  step  function  (the  names  of  these  are 
located  in  S_),  then  only  one  incremented  function  is  needed;  the  other  can  be 
determined  by  calculating  the  difference  between  the  two  constant  parts  LCEA  and 
adding  this  relocation  at  the  time  it  is  used.   If  these  are  identical,  so  much 
the  better. 

In  many  situations,  such  as  the  example  above,  the  two  constant  parts 
LCEA  of  two  step  functions  will  only  differ  by  a  fixed  constant,  but  this  fact 
will  be  hidden  by  the  arrangement  of  the  polish  postfix  strings  representing 
them.   For  example,  if  I  is  the  step  variable,  and  it  is  being  incremented  by  1 
starting  at  1,  the  LCEA  part  of  A(l,J,K-l)  is 


K  2  +  N2  *  J  1  +  Nl  *  -A'  + 


whereas  for 


A(I+1,J,K) 


it  is 


K  1  +  N2  *  J  1  +   Nl  1  'A'  + 

where  'A'  is  the  address  of  A(l,l,l).   A  direct  comparison  will  not  easily  yield 
the  fact  that  these  differ  by  N2*N1+1. 
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To  aid  this  recognition,  it  is  proposed  that  the  LCE's  be  reorganized 
as  they  are  generated.   A  suitable  form  is  such  that  LC's  always  appear  before 
C's  in  additive  groups.   They  can  easily  be  rearranged  as  they  are  removed  from 
the  S  stack  during  formation.   Additionally  when  an  LCE  of  the  form"LCE  string, 
constant,  +" is  multiplied  by  a  constant  such  as  N2,  the  value  of  "constant  *  N2" 
should  be  calculated,  yielding  say,  C  .   The  new  LCE  is  then  'LCE  string  N2  *  C  +." 
In  this  way,  additive  constants  are  moved  to  the  right  hand  side  of  the  string 
(and  in  an  expression  like  (l+3)*7+J+2  additions  are  saved).   A  direct  comparison 
of  the  LCE's  can  now  be  used  to  determine  if  they  differ  only  by  a  constant.   If 
this  is  so,  the  second  expression  can  be  calculated  from  the  first  by  simple 
addition.   Whether  this  is  better  done  outside  the  loop  or  at  the  time  of  use 
depends  on  the  machine  characteristics. 


7-   Auxiliary  Storage  Input 

The  reverse  pass  feature  of  this  method  can  be  used  to  handle  this 
optimizing  problem  at  a  simple  level.   During  the  first  pass,  the  input  statement 
is  moved  forward  over  statements  as  far  as  possible  such  that 

a)  No  transfers  into  or  out  of  these  statements  are  possible 
(determined  by  the  nonexistence  of  statement  numbers  or 
control  transfers  in  FORTRAN), 

b)  No  variables  used  or  changed  by  these  statements  are  in  the 
input  list,  and 

c)  None  of  these  statements  refers  to  the  same  i/o  unit. 

During  the  second  pass,  the  call  to  the  buffering  program  or  monitor 
is  moved  back  to  be  the  first  statement  of  the  program  segment,  that  is, 
immediately  following  the  first  transfer  into  or  out  of  the  program. 

8.   Conclusion 

Because  the  output  of  one  pass  is  read  backwards  by  the  next  pass, 
this  method  is  particularly  suited  to  machines  with  tapes  that  can  be  read  in 
either  direction.  Disk  files  or  drums  are  also  suitable  for  the  intermediate 
storage,  and  can  lead  to  very  fast  second  and  third  passes. 
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On  a  machine  with  a  large  main  memory  many  methods  of  optimization 
which  use  large  parts  of  core  are  feasible.   Since  the  first  pass  is  usually 
highly  input  limited,  it  might  be  an  advantage  to  use  the  method  outlined 
above  on  several  programs  simultaneously;  that  is,  the  compiler  program  is 
time-shared  by  several  inputs  and  auxiliary  devices,  each  using  a  small  piece 
of  memory  for  data  storage. 

Partial  optimization  of  the  object  code  in  a  machine  independent 
fashion  is  certainly  feasible  using  a  small  amount  of  main  memory  with  a 
reversible  auxiliary  store.   On  most  machines  it  is  likely  that  the  input  time 
for  the  source  program  will  dominate  the  compile  speed.   Optimization  of  the 
use  of  machine  features  such  as  index  registers  is  a  harder  problem,  probably 
requiring  two  further  passes,  one  in  each  direction,  since  in  the  proposed 
compiler,  information  about  which  addresses  would  best  be  in  index  registers 
is  not  available  until  Pass  II,  and  Pass  II  must  generate  code  whose  length  can 
be  determined  by  the  end  of  Pass  II. 
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